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[57] ABSTRACT 

A reversible Discrete Cosine Transform (DCT) is described. 
The reversible DCT may be part of a compressor in a 
system. The system may include a decompressor with a 
reversible inverse DCT for lossless decompression or a 
legacy decompressor with an inverse DCT for lossy decom- 
pression. 
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REVERSIBLE DCT FOR LOSSLESS-LOSSY In one embodiment, the compressor comprises a DCT 

COMPRESSION compressor having a multiple rotations (e.g., 2 point (2x2) 

integer rotations), a 4 point parametrized transform, and a 

HELD OF THE INVENTION subsidiary matrix. The 4 point transform comprises a rota- 

^ . . , ...iiur . 5 tion by B, while the subsidiary matrix comprises a rotation 

The present invention relates to the field of compression u a j * *• u 

J ^ . ^ . 1 , i_ by A and a rotation by C. 

and decompression systems; more particularly, the present . / 

invention relates to lossless discrete cosine transform The present myention also provides a method for ere aUng 

(DCI)-based compression which is reversible. ^"^^^ rounding ofifsets for use m a reversible 



BACKGROUND OF THE INVENTION 



BRIEF DESCRIPnON OF THE DRAWINGS 



The Discrete Cosine Transform (DCT) is an irrational ^ . > . , . ^ „ ^ 

transform commonly used in lossy image compression. The , P^f ^.^^^ be understood more fully from 

DCT is commonly used in lossy image compression. It is ^^^"^f description given below and firom the accom- 

used in many modes of the JPEG standard and the MPEG is P^nymg drawings of various embodiments of the invention, 

standards and future HDTV in the United States. For a "^^'f^ hov^cv^r, should not be taken to hmit the mvention 

discussion on the various standards, see ISO Standard docu- ^l^}^"" embodmients, but are for explanation and 

ments ISO/lEC 10918 (JPEG), 1U72 (MPEG 1), 13818 understanding only. 

(MPEG 2) and William B. Pennebaker and Joan L.Mitchell. FIG. lA is a block diagram of one embodiment of a 

"JPEG Still Image Data Compression Standard," 1993. The 20 lossless and lossy DCT based compression system, 

basis vectors of DCT have irrational values. Theoretically, FIG. IB is a block diagram of one embodiment of the 

integer inputs result in irrational transform coefficients. compressor of the present invention. 

Therefore, infinite precision is required to perform those piQ. IC is a block diagram of an alternate embodiment of 

transforms exactly. For use in compression, transform coef- the compressor of the present invention. 

ficients must be rounded to a finite representation. 25 2 is a block diagram of one embodiment of a video 

With most transform implementations, the rounding coef- authoring system, 

ficients to integers does not guarantee that every unique FIG. 3AiUustrates a block diagram of one embodimenl of 

integer mput resulte in a deferent output. Therefore, the ^ dimensional (ID), 8-point forward parameterized 

inverse DCT cannot reconstruct the input exactly. The error transform 

due to forward and inverse DCT transforms without quan- 30 ' 

tization is referred to as systemic error. This systemic error illustrates mtermediate values m a parameterized 

prevents DCT implementations from being used in lossless transform which have the same scale factor, 

compression without retaining a difference or error image. FIG- 4 illustrates the Hein form of the subsidiary matrix 

In practical DCT implementations, the transform basis - of PIG. 3A. 

vectors are also rounded. The difference between a given FIG- 5A is a block diagram of one embodiment of an 

implemeotation and the ideal transform (or a high accuracy 8-point Hadamard transform according to the present inven- 

floating point implementation) is referred to as mismatch. ^^^n. 

Low mismatch is required for data interchange. There can be FIG. 5B is a block diagram of one embodiment of an 

a trade-off between the amount of mismatch and speed, cost 8-point Haar transform according to the present invention. 

and other desirable features. FIG. 5C is a block diagram of one embodiment of a 

A parameterized transform referred to herein as the Allen 4-point Sine transform according to the present invention. 

Parameterized Transforai (APT) is a family of fast trans- piG. 5D is a block diagram of one embodiment of a 

forms which can implement the DCT or rational transforms 4-point Slant transform according to the present invention. 

that are arbitrarily close to the DCT. The APT is also referred r-i^^ *n ^ * u j- * r r j 1 jj 

1- J r .r^^^TT*. . - 45 FIG. 6 illustrates one embodiment of a forward ladder 

to as a generauzed Chen transform (GCT) For more mfor- cu c ^ -4 * 

♦ • *t. T All 1 • J ^ niter of a 2-point rotation, 

mation on the APT, see J. Allen, Generahzed Chen Trans- _ _ 

form: A Fast Transform for Image Compression," W/ta/ of ^, ^ 'U^strates one embodiment of an mverse ladder 

Electronic Imaging, Vol. 3(4), October 1994, pgs. 341-347; ^^^^ ^ ^"P°^"^ rotation. 

J. Allen, "An Approach to Fast Transform Coding in FIG. 8 illustrates a portion of the mapping for a 45° 

Software,** Signal Processing: Image Communication. Vol. rotation. 

8, pp. 3-11, 1996; and U.S. Pat. No. 20 5,129,015. FIG. 9 illustrates extras ("+") and coUisions ("o**) for 45° 

The present invention provides a reversible block based rotation, 

transform, such as, for example, a reversible DCT. The DCT FIG. 10 is a plot of collisions and extras in a 2,1 almost 

of the present invention may be included in a DCT-based 55 balanced transform. 

compressor/decompressor that may be used in a lossless FIG. 11 illustrates one embodiment of a look up table of 

compression/decompression system. The present invention part of a "B" 2-point rotation. 

also provides DCT transforms with no systemic error and no FIG. 12 is a block diagram of one embodiment of a 

(or low) mismatch. rotation according to the present invention. 

SUMMARY OF THE INVENTION ^° DETAILED DESCRIPTION OF THE PRESENT 

A reversible Discrete Cosine Transform (DCT) is INVENTION 

described. The reversible DCT may be part of a compressor A reversible DCT-based compression/decompression 

in a system. The system may include a decompressor with a apparatus and method are described. In the following 

reversible inverse DCT for lossless decompression or a 65 detailed description of the present invention numerous spe- 

legacy decompressor with an inverse DCT for lossy decom- cific details are set forth, such as types of transforms, 

pression. coefficient sizes, etc., in order to provide a thorough under- 
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staading of the present invention. However, it will be (ROMs), random access memories (RAMs), EPROMs, 

apparent to one skilled in the art that the present invention EEPROMs, magnet or optical cards, or any type of media 

may be practiced without these specific details. In other suitable for storing electronic instructions, and each coupled 

iiislaaces, well-known structures and devices are shown in to a computer system bus. The algorithms and displays 

block diagram form, rather than in detail, in order to avoid 5 presented herein are not inherently related to any particular 

obscuring the present invention. computer or other apparatus. Various general purpose 

Some portions of the detailed descriptions which follow machines may be used with programs in accordance with the 

are presented in terms of algorithms and symbolic repre- teachings herein, or it may prove convenient to construct 

sentations of operations on data bits within a computer i^ore specialized apparatus to perform the required method 

memory. These algorithmic descriptions and representations ^teps. The required structure for a variety of these machines 

are the means used by those skilled in the data processing ^1 ^pp^ar from the description below. In addition, the 

arts to most effectively convey the substance of their work present invention is not described with reference to any 

to others skiUed m the art. An algonthm is here, and particular programming language. It will be appreciated that 

generally, conceived to be a self-consistent sequence of steps ^ variety of programming languages may be used to implc- 

leading to a desired result. The steps are those requiring ^5 ^^^^ teachings of the invention as described herein, 

physical manipulations of physical quantities. Usually, Overview of the Present Invention 

though not necessarily, these quantities take the form of The present invention provides a reversible transform that 

electncal or magnetic signals capable of bemg stored, ^^^^^ DCT-based compression to be lossy and lossless, 

transferred, combined, compared, and otherwise mampu- Reversible Transforms are efi&cient transforms implemented 

lated. It has proven coovement at tunes, prmcipally for ^^^h integer arithmetic whose compressed results can be 

reasons of common usage, to refer to these signals as bits, reconstructed into the original. One embodiment of the 

values, elements, symbols, characters, terms, numbers, or reversible transform is an extension of the APT. 

the like, jjj^ reversible transforms of the present invention are 

Some portions of the detailed descriptions which follow efficient (or almost efficient), in that there is no redundancy 

are presented in terms of algorithms and symbolic repre- 25 in the least significant bits of the coefficients. That is, the 

sentations of operations on data bits within a computer transforms of the present invention are efficient in that they 

memory. These algorithmic descriptions and representations do not require a large number of bits of precision (which 

are the means used by those skilled in the data processing would otherwise be used in an attempt to eliminate the 

arts to most effectively convey the substance of their work systemic error. Efficiency leads to better lossless compres- 

to others skilled in the art. An algorithm is here, and 30 sion than using a non-reversible transform with a difference 

generaUy, conceived to be a self-consistent sequence of steps i^age. Several methods of constructing reversible APT 

leading to a desired result. The steps are those requiring implementations are described below. The reversible APT 

physical manipulations of physical quantities. Usually, has many applications, such as in video authoring systems, 

though not necessarily, these quantities lake the form of while transform coefficients may be rounded to any 

electrical or magnetic signals capable of being stored, 35" degree^of precision, the present invention rounds transform 

transfen-ed,- combined, compared, and otherwise manipu- coefficients to integers. Rounding more coarsely than to 

lated. It has proven convenient at times, principally for integers eliminates information and is a type of quantization, 

reasons of common usage, to refer to these signals as bits, Rounding more finely than to integers introduces redun- 

values, elements, symbols, characters, terms, numbers, or jancy in the least significant bits of the transform 

the like. 4q coefficients, hindering compression. 

It should be borne in mind, however, that all of these and The present invention provides OCT transforms with no 

similar terms are to be associated with the appropriate systemic error. Because there is no systemic error, the 

physical quantities and are merely convenient labels applied transforms are reversible, or lossless, transforms. These 

to these quantities. Unless specifically stated otherwise as reversible transforms can be used for lossless compression, 

apparent from the following discussions, it is appreciated 45 The present invention provides reversible DCT trans- 

that throughout the present invention, discussions utilizing forms which have low mismatch. Minimum quantization 

terms such as "processing" or "computing** or "calculating" matrices are given for ±1 mismatch of quantized coeffi- 

or ^'determining" or "displaying" or the like, refer to the cients. If the minimum or greater quantization is used, the 

action and processes of a computer system, or similar resulting coefficients can be used with any inverse DCT. 

electronic computing device, that manipulates and trans- 50 The present invention may be implemented in hardware 

forms data represented as physical (electronic) quantities of software, or a combination of both, 

within the computer system's registers and memories into System Overview 

other data similariy represented as physical quantities within The reversible DCT of the present invention may be used 

the computer system memories or registers or other such in lossless or lossy systems containing either a reversible 

information storage, transmission or display devices. Such 55 inverse DCT to obtain exactly what was originally input or 

computer systems typicaUy employ one or more processors containing a prior art (not reversible) inverse DCT. A prior 

to process data, which are coupled to one or more memories art DCT would be able to take the output of the reversible 

via one or more buses. DCT because of its low enough mismatch with the true DCT 

The present invention also relates to apparatus for per- to obtain exactly the same result as the normal DCT. In other 

forming the operations herein. This apparatus may be spe- 60 words, a MPEG or JPEG decoder with a legacy DCT 

cially constructed for the required purposes, or it may decoder may be used with the reversible DCT of the present 

comprise a general purpose computer selectively aaivated invention. 

or reconfigured by a computer program stored in the com- FIG. 1 is a block diagram of one embodiment of a lossless 

puter. Such a computer program may be stored in a computer and lossy DCT based compression system. Note that 

readable storage medium, such as, but is not limited to, any 65 although the present invention is described at times in terms 

type of disk including floppy disks, optical disks, of a DCT-based system, the present invention is applicable 

CD-ROMs, and magneto-optical disks, read-only memories to other block-based transforms. Referring to FIG. 1, input 
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data 100 (e.g., an input image) is compressed with a revers- ible DCT 122. The output of reversible DCT 122 is coupled 

ible DCT based compressor 101 of the present invention. to an optional, quaatization with scale factors block 127. The 

The input image 100 may be retrieved from a memory or output of the quantization with scale factors block 127 is 

received from a channel, both of which have not been shown coupled to the input of context model 133. Context model 
to avoid obscuring the present invention. The input image 5 133 produces the context for the data. These contexts are 

may be generated by a camera, a digitizer, a scanner, a frame forwarded to the probability estimation machine (PEM) 134. 

grabber, or other weU-known or similarly functioning PEM 134 generates probability estimates for the data 

device. The results of the compression are a pluraUty of o° received. These probability estimates are 

coefficients that may be output to a channel or to a storage «"^P^^ 1°. ^^ream generator (BG) 135 which generates the 
device. Note that other types of compression may follow or lo ^^^^^1 ^^'^^^^^^^ o° ^^^f ^%''^^''nn^?.'?f^^ "'^'^^^ 

precede the use of the DCT-based compressor. 1?3 a°d ^^e probability estimates from PEM 134. The output 

t: } i A ■ J -.u of bit stream generator 135 is coupled to signalmg block 

For lossless decompression, a decompressor with a r & & 

reversible inverse DCT (JDCT) 102 is used on unquantized pic 2 is a block diagram of an example application of the 

coefficients to exactly reconstmct the onginal. For lossy invention to a video authoring system. Referring to 

decompression, transform coefficients are quantized and 15 piG. 2, one or more input devices, such as cameras 201, 
then may be decompressed with a decompressor using any capture or obtain video images. During capture, video is 
inverse DCT (IDCT) 103. As discussed above, the lossy compressed losslessly by lossless compressor 202, which is 
decompressor may be a legacy system such as a JPEG or coupled to camera(s) 201. In one embodiment, this allows 
MPEG compliant decoder. approximately a factor of two savings in bandwidth and 

In one embodiment, the DCT compressor receives pixel 20 storage while not introducing any degradation. In other 
components into a DCT transform, the output of which words, there are no artifacts. Although the eventual target 
undergoes zigzag ordering to produce frequency coeffi- compression ratio for video is typically 100:1 lossy 
cients. Thereafter, the coefficients typically undergo lossless compression, initial lossless compression preserves infor- 
entropy coding. Similarly, in the inverse DCT compressor, mation for enhancement, digital effects and any frame to be 
frequency coefficients undergo lossless entropy encoding 25 a high quality I -frame (for video compression). I-frames are 
and then are input to a zigzag unordering blods and there- well-known in the art of video compression and are com- 
after to a DCT transform to retrieve the pixel components. pressed like a still image according to such compression 

Compressor 101 is shown coupled to decompressors 102 standards as MPEG and HDTV, 
and 103 through a channel or storage device 104. They may The output of compressor 202 is coupled to extraction 
not be physically coupled together at all. That is, data may 30 block 203. For editing, quantized DCT coefficients can be 
be compressed and stored using the compressor and an extracted by extraction block 203, (transcoded if necessary) 
entirely separate decompression system may access the and fed to a motion JPEG decompressor 204 to which it is 
information, or copies thereof, to access the compressed coupled. Extraction block 203 may operate by determining 
information. In this manner, the channel or storage is trans- which frames are desired and/or selecting only a certain 
parent, _ _ - - - -35 number of bits of each coefficient. For instance. Table 13 

FIG. IB illustrates is a block diagram of one embodiment described below indicates the number of bits to discard, (in 
of the compressor of the present invention. Referring to FIG. other words, which bits to keep). In another embodiment, 

IB, the compressor includes color space or subsamphng extraction block 203 may only select some of the blocks, 
block 121 which performs color space conversion or sub- thereby clipping the image. 

sampling of the input data. This is an optional block and may 40 The motion JPEG decompressor 204 may be a low cost 
not be included in some compressors. In one embodiment, device or any legacy device. Because the compressed data is 
the color space performed by block 121 is reversible. akeady in the DCT transform domain, the computation 

The output of color space/subsampling block 121 is required to try different cuts and to experiment with different 
coupled to reversible DCT 122. The transformed values quantizations is reduced. Thus, this embodiment allows 
output from reversible DCT 122 are coupled to the input of 45 information to be viewed as video real-time, without very 
zigzag ordering block 123, which performs well-Iaiown much extra processing. 

zig-zag ordering techniques. It should be noted that this After the editor has decided what information to keep 
zigzag ordering block 123 is also optional. The output of (e.g., what frames will be in the final version), a lossless 
zigzag ordering block 123 is coupled to run length block 124 decompressor, such as decompressor 205, can be used to 
which identifies mn lengths of zeros. The output of run 50 recover the original data. Note that the data may be retrieved 
length block 124 is coupled to the input of Huffinan coder from a store that contains only the edited data or which 
125, which performs Huffman coding. The output of the contains all or some portion of the original input data. In this 
Huffman coding block is coupled to the input of signaling case, some logic or processing would be needed to access 
block 126 which sets forth the signaling for the decoder to the correct information for decompression. This processing/ 
indicate to the decoder what type of quantization or decod- 55 logic would be well-known to one skilled in the art. 
ing options were taken to enable the decoder to effectively An enhancement mechanism 206 may be coupled to 
decode the encoded data. In one embodiment, signaUng decompressor 205 to enhance or preprocess the original 
block 126 generates a header that precedes the compressed data, if necessary, without the possibihty of compression 
data and indicates to the decoder the information to enable artifacts being exaggerated. Examples of such enhancement 
decoding. 60 or preprocessing include enlarging part of an image, inter- 

Optionally, quantization with scale factors may be applied polate between frames to do slow motion, sharpening, noise 
after the reversible DCT block 122 and prior to zigzag reduction, etc. These and other well-known enhancement 
ordering block 123. Such quantization with scale factors is mechanisms are well-known in the art. 
described in more detail below. After any enhancement, a compressor 207 performs a full 

FIG. IC is a block diagram of an alternate embodiment of 65 MPEG compression with motion compensation to generate 
the compressor of the present invention Referring to FIG. the final compressed data. Compressor 207 is well-known in 

IC, color space/subsampUng block 121 is coupled to revers- the art. 
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While the reversible DCT could be used with any lossless 
decompression application, it is most useful when used with 
legacy lossy DCT based systems. Other unified lossy/ 
lossless compitJssioD system sucb as compression with 
reversible wavelets might be used if legacy system compat- 
ibility is not important. 

The present invention may be extended to any block 
based transform or any non-overlapped transform that can 
be decomposed into 2-point rotations. For example, the 
present invention may be used with the following trans- 
forms: DFT/unitary DFT, cosine, sine, Hadamard, Haar, 
Slant, Karhunen-Loeve, Fast KC, Sinusoidal transforms, a 
SVD transform, lapped, orthogonal transform (LOT), as 
well as others. See Jain, Anil K., Fundamentals of Digital 
Processing, Prentice-Hall, Inc. 1989, pgs. 132-138. Given 
these examples and the examples described below, it would 
be apparent to one skilled in the art to implement other 
transforms. 

Fast DCT Decompositions With the APT 

The Allen Parametrized Transform (APT), formally 
referred to as the Generalized Chen Transform (GCT), 
reduces the DCT and a family of other related transforms to 
a cascade of "integer rotations". In one embodiment, each 
rotation comprises a transform with the absolute value for its 
determinant being 1. The present invention obtains a revers- 
ible DCT by decompressing the DCT into a multiple revers- 
ible components. Because the individual parts are reversible, 
the DCT is reversible. 

FIG. 3A illustrates a block diagram of a lD,8-poinl 
forward APT. Most of the APT transform is composed of 
two point rotations which are labeled by the arc tangent of 
a rotation angle. (The rotations are shown as "clockwise" 
and with their determinant equal to -1.) 

Referring to FIG. 3 A, the 8-point transform can be 
grouped into four initial rotations by 45° (arctan=l) 340, a 
4-point APT 320, and a "subsidiary matrix" 33D. The 
-subsidiary tnatrix 330 contains two multiplications in addi- 
tion to multiple 2-point rotations. 

Thus, there are three sets of rotations that form the 
forward APT transform. First, a set of 2-point (2x2) rotations 
301-304 provide an input stage. Outputs from each of 
rotations 301-304 are coupled to the inputs of four point 
APT block 320, which contains 2-point rotations 305-308, 
and subsidiary matrix 330, which contains 2-point rotations 
309 and 312-315 and multipliers 310 and 311. 

Specifically, rotation 301 receives inputs 0 and 7 corre- 
sponding to two input data samples and generates one output 
to an input of rotation 305 and another output to an input to 
rotation 312. Rotation 302 is coupled to receive input data 
samples 1 and 6 and provides two outputs, one coupled to an 
input of rotation 306 and one coupled to the input of rotation 
309. Rotation 303 is coupled to receive input data samples 
2 and 5 and generates two outputs, one of which is coupled 
to the other input to rotation 306 and another coupled to the 
other input to rotation 309. 

Rotation 304 is coupled to receive input data samples 3 
and 4 and generates two outputs, one of which is coupled to 
the other input of rotation 305 and an input to rotation 313. 

Rotation 305 generates two outputs, one of which is 
coupled to an input of rotation 307 and the other is coupled 
to an input of rotation 308. Rotation 306 generates two 
outputs, one of which is coupled to the other input of rotation 
307 and the other of which is coupled to the other input to 
rotation 308. In response to these inputs, rotation 307 
generates the 0 and 4 outputs, while rotation 308 generates 
the 2 and 6 outputs. 

With respect to the subsidiary matrix, rotation 309 gen- 
erates two outputs coupled to muhiply-by-R blocks 310 and 
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311. The output of multiply-by-R block 310 is coupled to the 
other input of rotation 312, while the output of multiply- 
by-R block 311 is coupled to the other input to rotation 313. 
In response to its inputs, rotation 312 generates two outputs 
that are coupled to an input to rotations 314 and 315. 
Similarly, in response to its inputs, rotation 313 generates 
outputs which are coupled to an input to rotation 314 and 
input to rotation 315. In response to these outputs, rotations 
314 and 315 generate the 1 and 7 outputs and the 3 and 5 
outputs respectively. The A, B, and C rotations will be 
described in more detail below. In one embodiment, each of 
rotations 301-304 may be the S-transform . However, in such 
a case, mismatch may suffer. 

The subsidiary matrix shown in FIG. 3Ais the Chen form. 
An alternative due to Hein and Allen, J. Allen, and described 
by J. Allen in, "Generalized Chen Transform: A Fast Trans- 
form for Image Compression," Journal of Electronic 
Imaging, Vol 3(4), October 1994, pgs. 341-347, is shown in 
FIG. 4. Referring to FIG. 4, the subsidiary matrix comprises 
six rotations (by angle). A pair of rotations by an angle of a 
rotation by 45° (or arctan»l) are coupled to receive two 
inputs each and generate two outputs. One of the outputs of 
each of rotations 401 and 402 is coupled to the inputs of 
rotation 403, while the other two outputs of rotations 401 
and 402 are coupled to the inputs to rotation 404. In response 
to these inputs, rotations 403 and 404 generate two outputs. 
Rotations 403 and 404 comprise the B rotation. One of the 
outputs of each of rotations 403 and 404 is coupled to the 
inputs of rotation 405 while the other outputs of each of 
rotations 403 and 404 is coupled to the inputs to rotation 
406. Rotations 405 and 406 comprise the A and C rotations 
respectively. Each of rotations 405 and 406 generate the two 
outputs, 1 and 7 and 5 and 3, respectively. 

In one embodiment, the rotations are 2 point (2x2) 
-rotations. Each rotation may comprise a 2 point transform or 
filter. 

The outputs are scaled to match the DCT. That is, the 
present invention generates outputs that require the use of 
scale factors to change the outputs to match those that would 
result had a floating-point DCT been used. For lossy 
compression/decompression, scale factors may be combined 
with quantization factors. The scale factor used for each 
output can be determined from the product of the individual 
scale factors for each 2-poini rotation. For the 2-poini 
rotation matrix of the form shown in FIG. 3A, the scale 
factor for both outputs of every rotation is given by the 
following equation: 



50 



scale/actor = - 



+ 1 



For the following 2-point transforms: 



a b 1 

b -a\ 



the scale factor is 
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For the following 2-point transforms: 



the scale factor is 



for an output due to a,b and 

1 



for an output corresponding to c,d. 

The separable two dimensional (2D), 64 point DCT 
(APT) can be implemented with eight ID, 8-point DCTs 
(APTs), a transpose, and another eight ID, 8-point DCTs 
(APTs). 

FIG. 3B illustrates intermediate values in the APT which 
have the same scale factor. Referring to FIG. 3B, each 
shaded lines indicates inputs that have the same scale factor. 
Using the same scale factors constrains the divisors in 
two-point transforms. For example, most rotations by 1 (all 
but 309 ) are not required to have the same scale factor for 
both outputs, so an unbalanced transform such as the 
S-transform could be used. In contrast, the cascade of a 
rotation by 1 followed by multiplication by R ( 309 ) must 
have the same scale factors on the output as the input: ' 

Referring to FIG. 3B, the two inputs for rotation 307 have 
the same scale factors. The two inputs to rotation 308 have 
the same scale factors. The two inputs to rotation 314 have 
the same scale factors and the inputs to rotation 315 have the 
same scale factors. ITie inputs to rotations 305 and 306 are 
the same. In the case of rotation 305, this would constrain 
the upper branches from each of rotations 301 and 304, With 
respect to rotations 312 and 313, not only do their inputs 
have the same scale factors, but also all the lower branches 
from each of rotations 301 and 304 have the same scale 
factors. Because the scale factors of the lower branches 
output from rotations 301 and 304 have the same scale 
factors as the lower branch outputs of rotations 302 and 303, 
then the scale factors of all inputs to the subsidiary matrix 
are the same. 

The product of the scale factors of two outputs is the 
amount of expansion. Therefore, to be eflBcient, the product 
of both scale factors must ideally be 1 to prevent expansion. 
In one embodiment, to be reversible, both scale factors are 
1. In alternative embodiments, the scale factors can be 
slightly different. For example, both scale factors could 0.99. 

Table 1 illustrates values for APT parameters for three 
different embodiments. The first set of parameters are the 
irrational numbers that result in the DCT. DCT implemen- 
tations (APT or other) for compression cannot approximate 
irrational values. All actual DCT implementations approxi- 
mate irrational parameters, even if the approximations are 
very accurate high precision floating point approximations. 
The APT uses rational approximations with small integers, 
which leads to tractable reversible implementations. 



20 



25 



40 



50 



55 



10 

TABLE 1 



DCT 



AFT parameters 
APn APT2 



AFT3 



A TAN7n/16 - 5/1 = 5.0000 5/1 = 5.0000 643/128 = 5.0234 
5.0273 

B TAN6rt/16 - 12/5 - 2.4O0O 128/53 - 2.4000 128/53 - 2.4151 
2.4142 

10 C TAN5rt/16- 3^/2 = 1.5000 3/2 = 1.5000 383/256 = 1.4961 
1.4966 

R SQRTl/2 - 128/181 « 128/181 = 0.7072 128/181 = 0.7072 

0.7071 0.7072 



Table 1 shows three sets of APT parameters which trade- 
off simplicity vs. mismatch. APT 1 is simple and good for 
compression. The other examples, APT 2 and APT 3, are 
closer approximations to the irrational transform. APT 3 
meets the CCITT Rec. H.261 (IEEE std 1180-1 5 1990) 
accuracy test. 

The choice of APT parameters is not the only source of 
mismatch in a reversible transform. Reversible, efficient 
transforms require careful (reversible) rounding to integers 
at each step of the transform. These rounding operations also 
cause mismatch. Some mismatch is unavoidable because 
methods that result in the same coefficients as a floating 
point implementation cannot be efficient, reversible. Since 
mismatch due to rounding usually dominates mismatch due 
to parameter choice, the APT 1 parameters may be a good 
choice. However, the techniques of the present invention 
could be applied to other parameters. 

It should be noted that these APT parameters may be 
adjusted to obtain other transforms, which may or may not 
be reversible. 

Reversible DCT Implementations 

In the -present invention, each 2-point rotation in the"APT 
components are made reversible. By making each reversible, 
the entire APT is made reversible because each step may be 
reversed. In addition to eflBciency, two other properties are 
desirable: balanced scale factors and no internal rounding. 

A 2-point transform has "balanced" scale factors if the 
scale factors for both outputs are the same. For a transform 
to be efficient (or almost efficient), it's determinant is ±1 (or 
almost ±1). If the determinant is constrained to be ±1, the 
product of the scale factor for the two outputs is 1. Having 
both scale factors equal to 1 is desirable. In another 
embodiment, one scale factor is the reciprocal of the other. 
In this case, one scale factor is greater than 1. In this manner, 
the determinant will be ±1. A scale factor greater than 1 
causes quantization, resulting in mismatch. 

Note that the scale factors in the equation given above are 
for an APT that is not efficient so their product is not one. 
Having both scale factors less than 1 leads to good rounding 
allowing for low mismatch, but such a system is not 
reversible, efficient. 

Rounding at each step allows reversibility. A 2-point 
rotation with "no intemal rounding" indicates that at most 
only two rounding operations at the output of each step are 
performed. Some implementations have additional rounding 
operations inside the step, for example the ladder filter 
implementations described below. Extra rounding opera- 
tions increase mismatch. 

When used for lossy compression, the DCT is unitary so 
the same transform can be used for both the forward 
transform and the inverse transform. However, in the present 
invention, for reversible implementations, the inverse trans- 
form inverts the rounding. 

Therefore, the inverse transform has the inverse data flow 
of FIG. 3 with each forward 2-point transform and multi- 
pUcation replaced with the corresponding inverse transform. 



05/17/2004, EAST Version: 1.4.1 



6,058,215 



11 



Reversible Implementations Without Internal Rounding or 
Look Up Tables 

In one embodiment, the present invention provides 
reversible implementations of 2-point rotations and multi- 
plications that do not require internal rounding or look up 
tables. For parameters that have efficient, balanced 2-point 
rotations of this type, these are important building blocks. 
Some 2-poinl transforms are described below that are not 
balanced or only ahnost efficient. Ladder filter and look up 
table alternatives are provided. 

The choice of offsets often controls achieving reversibil- 
ity, A discussion of how only some offsets results in revers- 
ible transforms is given below. 
"1": 1,1 -transform, S-Transform-imbalanced 

In one embodiment, the "1" blocks may be implemented 
with the following transform where a and b are the inputs to 
the forward transform and x and y are the outputs of the 
forward transform: 



y-a-b b-x 



2 



20 



Note that the floor ([J) and ceiling ([.]) functions mean to 
round towards negative infinity and positive infinity respec- 
tively. 

In one embodiment, scale factors are ^nd 1^ respec- 
tively. 

"A":5,l-transform-unbalanced 

The following is one embodiment of the "A" rotation. 



I 5a -1-^? + 13 t 



y = a~Sb 



a = 5 J - 2 + 



26 



.35 



where the recovered least significant bits are given by the 
equation (i.e., rounded away bits): 

M5y-13)mod 26 

In this embodiment, scale factors are ^^and l/>^ 
respectively. 

"A ":5,1 -transform-balanced. Inefficient 

The following is an alternate embodiment of the A rota- 
tion. 



\a~bb-l\ I 

I— I "=[ 



25jf + 5)f- 13 4.5(S2 +(5i 
26 

5jt - 25>' + 8 - 5(5i + Si 
26 



where 



6=C-(25x+5y-12)) mod 26 
6i6 mod 5 
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sion. Because it is balanced, it is more useful for lossy than 
the balanced efficient version. 

In cas embodiment, scale factors are both 5/^^. 
"A ": 60, ll-transform -balanced, Efficient 

Another alternative embodiment of the "A" rotation is as 
follows: 



10 



-i 
'•I 



60a + m + 30 

"el 

-6Q£. + 30 



I "I 



61 



60bt -»- lljv + 30 1 
61 J 
Jljr-60y + 30 | 
61 J 



This transform is balanced and efficient. This transform 
uses 11,60,61 which is a Pythagorean triplet a,b,c with 
b+l=c and a^=2b+l. However, the result of 60/11=5.4545, 
which is not a very good approximation for tan 
7n/16=5.0273. Here, the closeness to the DCT has been 
sacrificed for balanced, efficiency and simplicity in compu- 
tation. 

In this case, scale factors are both 1. 
"B ":12,5-transfonn 
The following is one embodiment for the "B" rotation: 



25 



30 



I 5a- 12if + 6 I I Sx- l2y + 6 I 

[-13—1 * = l— T^l 



5a- 12^ + 6 
13 

5a-m + 6 



This is both balanced and efficient. The numbers 5,12,13 
are a Pythagorean triplet a,b,c with b+l=c and a^=2b+l. Hiis 
leads to a very good 4-point APT (DCI). 
- Scale -factors are both 1. ' 

Note that offsets are very important for reversibihty. The 
choice of +6 for both offsets in the forward transform of the 
equation above results in a reversible transform, while other 
offsets do not. For example, if the offsets are both zero as in 
the first set of equations below, then the inputs a=0,b=0 and 
a=l,b=0 both results in x=0,y=0. 

i2a + 5A| 



45 



I i2a + 5A| 

'-[—] 

I 5a- 12i| 



It is apparent from this result that the selection of the offset 
can control reversiblity. 

Another example is shown below where both ofifeets are 
+5. In this case, inputs a-4, b-0 and a-4, b-1, both results 
in x-4,y-l. 



55 



60 



"I 

'=1 



13 J 
5a-J2i + 5 
"13 



This transform has determinant 26/25 32 1.04. Therefore, 
the redundancy is log2l.04=0.06 bits. It is inefficient, but 
close enough to efficient to be useful for lossless compres- 



Most pairs offsets do not result in a reversible transform. 
Of the 169 possible pairs of offsets (offsets are 0 ... 12), the 
only 13 pairs of offsets that result in reversibility are 0,10; 
65 1,5; 2,0; 3,8; 4,3; 5,11; 6,6; 7,1; 8,9; 9,4; 10,12; 11,7 and 
12,2. 

"C": 3, 2-transform -unbalanced 
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One embodiment of the "C rotation is as follows: 



I 3a ■»- 2* + 6 I 
''[ 13 J 



:3j:- 1 + 



13 J 
-3y^\+ 36 



y^2a-3b b = 2x-\ + 



This is an efiScient transform. In this case, scale factors are lO 
^ and respectively. 

"C": 3,2-transform-unbalanced with Growth in Sum 

An alternate embodiment of the "C" rotation, which is 
unbalanced with growth in the sum, yet efficient, is as 
follows: 15 



x=2a + 3b 
\2a-2b^6 

(S = (6 + 5x)mDd 13 



b = 



I 3;c- 12+35 1 
I 13 J 
2;t + 5-3<5 
13 



^2y 



20 



In this case, scale factors are and respectively. 

It is convenient in unbalanced transforms to divide the 
sum by the larger divisor. This leads to minimum growth in 
coefficient size. However, the sum leads to more visually 
relevant coefiBcients. Using the larger divisor on the differ- 
ence and allowing more growth in the sum leads to lower 
mismatch in the more visually relevant coeflEicieots. 
"C": 4,3-transforra-balanced 

An alternate embodiment of the "C" rotation, which is 
balanced is as follows: 



25 



j4a-t-3A + 2| |4.ir+3>' + 2| 



5" 

|3£i-4Z? + 2| |3jc-4y + 2| 



-35 



This transform is balanced and ef&cient. Again, the number 
set 3,4,5 is a Pythagorean triplet a,b,c with b+l-c and 
a^=2b+l. However 4/3=1.3333 is not a very good approxi- 
mation for tan 5:c/16=1.4966. Scale factors are both 1. 
The Multipher "R": ^ 

In one embodiment, the_multiplication factor using an 
integer approximation of ^^2. The R factor normalizes the 
subsidiary matrix. 



40 



|256fl + 90 



J ^=1^56— J 



L 181 
d=C90- 1 Six) mod 256 
OstS< 181 



Non-DCT Transforms 

FIG. 5A illustrates and 8-point Hadamard transform. 
Referring to FIG. 5 A, rotations 501-512 comprise 2-point 
rotations by tan(jt/4)-l. Rotation 501 is coupled to receive 
input data samples 0 and 7 and generate outputs to rotations 
505 and 507. Rotation 502 is coupled to receive input data 
samples 1 and 6 and generate outputs to rotations 506 and 
508. Rotation 503 is coupled to receive input data samples 
2 and 5 and provide outputs to rotations 506 and 508, while 
rotation 504 is coupled to receive input data samples 3 and 
4 and provide outputs to rotation 505 and 508. 

In response to its inputs, rotation 505 generates outputs to 
rotations 509 and 510. In response to its inputs, rotation 506 
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generates outputs to rotations 509 and 510 as well. In 
response to these inputs, rotation 509 generates output 
samples 0 and 4, and in response to its inputs, rotation 510 
generates output samples 2 and 6. 

Rotation 507 generates outputs to rotation 511 and 512. 
Rotation 508 also generates outputs to rotation 511 and 512. 
In response to these inputs, rotation 505 generates output 
samples 1 and 5, while rotation 512 generates output 
samples 3 and 7. 

FIG. 5B shows an 8-point Haar transform. Referring to 
FIG. 5B, the Haar transform comprises rotations 520-526 
which are each 2-point rotations by tan(7T;/4)=l. Rotation 520 
is coupled to receive input data samples 0 and 1 and generate 
output data sample 4 and one output to rotation to 524. 
Rotation 521 is coupled to receive input data samples 2 and 
3 and generate outputs to rotation 524 and the output data 
sample 5. 

Rotation 522 is coupled to receive input data samples 4 
and 5 and generate outputs to rotation 525 and output data 
sample 6. Rotation 523 is coupled to receive input data 
samples 6 and 7 and generate outputs to rotation 525 and the 
output data sample 7. In response to its inputs, rotation 524 
generates the output data sample 2 and an output to rotation 
526. Rotation 525 generates an output to rotation 526 and 
generates output data sample 3. Rotation 526 outputs 
samples 0 and 1 in response Xo its inputs. 

FIG. 5C illustrates on embodiment of the 4 point Sine 
transform. Referring to FIG. 5C, the Sine transform com- 
prises rotations 531-534. Rotations 531 and 532 comprise 
2-point rotations by tan(3Ty4)»l , while rotations 533 and 534 
comprise rotations by D which is set forth as the tan 
(0.1762n). 

Rotation 531 is coupled to receive the input data samples 

0 and 3 and produce outputs to rotations 533 and 534, while 
rotation 532 is coupled to receive input data samples 2 and 

1 and generate outputs to rotations 533 and 534. In response 
to their respective inputs, rotation 533 generates output data 
samples 0 and 2, while rotation 534 generates output 
samples 3 and 1. 

FIG. 5D illustrates one embodiment of the 4-point Slant 
transform. Referring to FIG. 5D, the Slant transform com- 
prises rotations 540-543. Rotations 540-542 comprise 
2-point rotations by tan(n/4)=l, while rotation 543 com- 
prises a 2-point rotation by tan(0.1024jr). Rotation 540 is 
coupled to receive input data samples 0 and 3 and provide 
outputs to rotations 542 and 543, while rotation 541 is 
coupled to receive input data samples 2 and 1 and generate 
outputs to rotations 542 and 543. In response to its inputs, 
rotations 542 generates output data samples 0 and 2, while 
rotation 543 generates output samples 3 and 1. Given these 
examples described above, one skilled in the art may imple- 
ment other transforms as well. 

Efficient, Reversible 2-point Rotations Using Ladder Filters 
Ladder filters can be used to implement any 2-point 
rotation in a reversible, efiScient, balanced fashion. Ladder 
filters have both scale factors equal to 1. The equation below 
is a ladder filter decomposition for a determinant 1 
(counterclockwise) rotation. 



r cos 0 sin 9 ] 
[ -sin e cos 0 J ~ 



1 0 


ri sin^l 


1 0 




COS ^-1 


lo I J 


cos 5 - 1 


sin 9 




sin 9 



To be reversible and efiBcient, each multiplication is 
followed by rounding to integer as shown in FIGS. 6 and 7. 
To be reversible, multiplications by irrationals are per- 
formed the same in the forward and inverse transforms. 
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Referriog to FIG. 6, a ladder filter implementation is 
shown having inputs 610 and 611. Input 611 is coupled to a 
multiplier 602 which multiplies input 610 by the quantity of 
COS0-1 divided by sin6. The result of the multiplication is 
rounded to the nearest integer at block 603. The results of the 5 
rounding are added to input 611 by adder 604. The output of 
adder 604 is coupled to multiplier 606 which multiplies the 
output of adder 604 by sinO. The result is rounded to integer 
at block 605. The results of the rounding are added to input 
610 using adder 601. The output of adder 601 is one output lo 
of the ladder filter, output 612. The output of adder 601 is 
also input to multiplier 607 which multiplies the output of 
the adder 601 by the quantity of cos6-l divided by sin6. The 
results of the multiplication are rounded to integer by block 
608. The results of rounding are added to the output of adder 15 
604 using adder 609. 

The output of adder 609 is the other output of the ladder 
filter, output 613. 

Referring to FIG. 7, two inputs, inputs 701 and 702, are 
input into the ladder filter. Input 701 is input to multiplier 20 
703 which multiplies input 701 by 



cos - 1 
sin e 



The results of the multiplication are rounded to integer by 
block 704. The results of the rounding are subtracted firom 
input 702 by subtracter 705. The output of subtracter 705 is 
coupled to the input of multiplier 706 which multiplies it by 
sm0. The results of the multiplication are rounded to integer 
by block 707. The results of the rounding are subtracted 
from input 701 by subtracter 708. The output of subtracter 
708 is one output of the ladder filter, output 712. 

The output of subtracter 708 is also coupled to the input 
of multiplier 709 which multiplies it by - - - - " 

cos e - 1 



sin 5 



The results of the multiplication are rounded to integer by 
block 710. The results of the rounding are subtracted from 
the output of subtracter 705 by subtracter 711. The output of 
subtracter 711 is the ether output of the ladder filter, output 
713. 

The efEect of the three rounding operations and the effect 
of the precision of implementing the irrational multiplica- 
tions causes mismatch error. Ladder filter implementations 
often have more mismatched error than other implementa- 
tions. 

Instead of decomposing the whole DCT into 2x2 
rotations, larger ladder filters can be constructed. For 
example, the 4-point DCT can be implemented as a ladder 
filter by decomposing into three matrices as shown below: 
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35 



c=l + - 

ai =2(-l+ry) 
1 

*i = - 1 + - 
1 

= I - 2n- + — -H 2- + + Zry 

r y y 

2? 
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1 
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:l-2ry 
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' 2 

^l-r^-ry. - 

3 X 
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fi = 

€2 
fl 



40 



These matrices contain the following constants; 
1 



Thus, the present invention provides a novel 4-point 
reversible, as weU as performing an efBcient, reversible 2x2 
decomposition for the 4 point (4x4) rotation. 

It should be noted that although in the above description 
the transform includes a 2-point DCT, 4-point APT and an 
45 8-peint x 8 non- trivial matrix, the present invention may be 
expanded to other sizes such as 16x16, for example. 
Look Up Tables for Efficient, Reversible 2-point Rotations 
Ladder filters can be used to make an efiScient, reversible 
implementation for any 2-point rotation. Tlie present inven- 
50 tion provides a look up table based method and apparatus for 
controlling rounding to create efficient, reversible trans- 
forms that have improved rounding. Improved rounding 
reduces mismatch error. Balanced transforms can be 
constructed, and if a transform is balanced, then both scale 
55 factors are 1. 

Given a transform with determinant greater than or equal 
to 1, a one-to-one or one-to-many mapping input values to 
transformed values is possible. For reversible transforms, 
only a one-to-one mapping is of interest. For transforms with 
60 a determinant slightly greater than one, a small number of 
the possible transformed values can be unused and the 
mapping can be treated as one-to-one. 

There are 2-point integer rotations that cannot be made 
reversible with any fixed choice of rounding ofl&ets. For 
65 example, consider the following equations which is a 45** 
rotation using the approximation given for APT parameter R 
in Table 1. 
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y = ■ 



128a+-128i' + /(fl,b) 

jf = 

181 

128fl-128t+g(fl,i») 
y = 

^ 181 
128(a + if)-»-/((a-t-6)mod 181. (a-d) mod 181) 
181 

i28(a - i») + ^((a + mod 1 SI, (a - b) mod 1 8 1) 

iil 
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This is not reversible for any constant rounding ofiFsets. 
However, if the rounding offsets are functions of the inputs, 
then the function can be made to be reversible. That is, the 
rounding varies as a function of the inputs. Furthermore, in 20 
this case, the rounding offsets are only functions of the sum 
and difference of the inputs modulo the divisor of 181. An 
example of the function is described below in conjunction 
with Table 2. Therefore, there are only 181* 181 =32761 pairs 
of rounding offsets. 25 

In one embodiment, the modulo portion of the above 
equation is removed. 

FIG. 8 shows a portion of the mapping of input values to 
output transformed values for the 45° rotation. The above 
equality may be rewritten as 



-V 
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The number of collisions or the number of extras that do 
not have a pair is indicative of the expansion of the trans- 
form. 

Table 2 shows an example mapping for a 5** rotation using 
the approximation l/^n5/7-0.7143. 

TABLE 2 

Example Mappina 



sum. 


fLoatiog 




integer 


rounding 


sum of 


difference 


point 


outputs for 




off- 


squared 


inputs 


outputs 


reversibility 




set 


error 


0 0 


0.00 


0.00 


0 


0 


0 


0 


0-000 


2 0 


1.41 


0.00 


1 


0 


0 


0 


0.172 


4 0 


2.83 


0.00 


3 


0 


0 


0 


0.029 


6 0 


4.24 


0.00 


4 


0 


0 


0 


0.059 


1 1 


0.71 


0.71 


1 


1 


0 


0 


0.172 


3 3 


2.12 


0.71 


2 


1 


0 


0 


0.101 


5 1 


3.54 


0.71 


4 


1 






0.302 


0 2 


0,00 


1.41 


0 


1 


0 


0 


0.172 


2 2 


1.41 


1.41 


2 


0 


1 


-1 


2.343 


4 2 


2.83 


1.41 


3 


1 


0 


0 


0.201 


6 2 


4.24 


1.41 


3 


2 


-1 


1 


1.887 


: 3 


0.71 


2.12 


1 


2 


0 


0 


0.101 


3 3 


2.12 


2.12 


2 


2 


0 


0 


0.029 


5 3 


3.54 


2.12 


4 


2 


0 


0 


0.230 


0 4 


0.00 


2.83 


0 


3 


0 


0 


0.029 


2 4 


1.41 


2.83 


1 


3 


0 


0 


0.201 


4 4 


2.83 


2.83 


3 


3 


0 


0 


0.059 


6 4 


4.24 


2.S3 


4 


3 


0 


0 


O.OSS 


1 5 


0.71 


3.54 


1 


4 


0 


0 


0.302 


3 5 


2,12 


3.54 


2 


4 


0 


0 


0.203 


5 5 


3.54 


3.54 


4 


4 


0 


0 


0.431 


0 6 


0.00 


4.24 


0 


4 


0 


0 


0.059 


2 6 


1.41 


4.24 


0 


2 


-1 


-2 


7.029 


4 6 


2.83 


4.24 


3 


4 


0 


0 


0.088 


6 6 


4.24 


4.24 


2 


3 


-2 


-1 


6.574 



The sum and differcnce of the inputs are s and d respectively 
(s-a+b, d-a-b). Note the parity of the sum and difference are 
the same; that is, both are even or both are odd. The shaded 
squares indicate pairs of values that cannot occur since the 
parity is not the same. Only the unshaded pairs of values 
actually occur. Also shown is the s and d divided by >^ with 
normal rounding to integer. The heavy lines group pairs of 
values with the same s 1^ and d 1^, The mapping is 
one-to-one already for every heavy line region than has a 
single unshaded square. Regions with two unshaded squares 
indicate problems where "collisions" or "holes" occur, 
where normal rounding maps two possible inputs to the 
same transform values and they both would give the same 
answer. Regions with only a shaded square indicates 
"extras," i.e. transform output values that would not be used 
with normal rounding. The arrows show how with proper 
rounding, the mapping can be made to be one-to-one by 
using the output values for extras for the input values that are 
colUsions. 

For example, where the s and d inputs are 2 and 2, the 
output will not be 1,1. Instead, it will be 0,2 (see arrow 801). 
When performing the inverse, a look-up table entry for 0,2 
would point to output 1,1. The determinant ^1 condition 
guarantees that for each collision, there is at least one extra. 
If the collisions are represented by nearby extras, mismatch 
due to rounding is reduced and may be minimized. 

FIG. 9 shows the collisions ("O") and extras ("+") for a 
45** rotation using the approximation 1/^^=29 /41 =0.7073. 
(The smaller denominator of 41 is used instead of 181 so all 
possibilities can be shown on the page. The corresponding 
figure for a denominator of 181 is similar.) In this example, 
the determinant is very close to 1(1 .0006) and the number of 
extras is equal to the number of collisions. 



35 .This is not very accurate,-but the denominator is small 
enough that the results of all possible (sum, difference) pairs 
of inputs can be listed on a page. For each pair, the sum of 
the squared error is listed. Noted that except when both 
inputs are 0, there is some error even if the best rounding to 

40 integer is used. The average RMSE for rounding pairs to the 
closest integer is 1/^=0.289. The 5/7 approximation has 
four collisions out of 25 possible input pairs. These four 
collisions increase the RMSE to 0.6529 for this approxima- 
tion. The columns entitled "rounding ofEset" are the output 

45 of look up tables. For the forward transform, the columns 
"sum, difference inputs" are the inputs and for the inverse 
transform, the colimms "integer output for reversibility"* are 
the inputs. 

The 128/181 approximation given in Table 1 is reasonably 
50 accurate. The numerator 128 is a power of 2 which is 
computationally useful. The determinant is 1.0002 
(log2l.0002=0.0003 bits) so it is very close to cfBcient. The 
average RMSE with a good look up table is 0.4434 and the 
peak error is 1.92. 
55 Forward Computation 

A complete forward computation of a rotation for the 2x2 
DCT is accomplished as follows. The approximation 1/ 
^2al28/181 is assumed for this example. First, in order to 
compute the forward, the sum and difference of the inputs a 
60 and b are computed according to the following equations. 

sum -a +i? 
difference =a-b 

Next, the sum and difference are divided by 181, saving 
65 the remainder, according to the following equations. (Note 
that 1 28/181 al 81/256 can be used in some implementations 
to speed up the division.) 
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ss^um / 181 
dd'difference } 181 
s-sum mod 181 

5 

d-difference mod ISl 

The look up table assumes the pair of modulo 181 values 
s,d have the same parity (they are either both even or both 
odd). If ss and dd do not have the same parity, then the parity iq 
of one of the modulo values is changed. The change is made 
so values stay in the range 0 . . . 180. In the pseudo-code 
below, means exclusive OR, This step is needed for odd 
denominators, it is not needed for even denominators. 

The pseudo code is as follows: 15 

if (as is odd Euid dd is even) or (jj is even Jind dd is odd) 
if (d= 180) 

s' =sAi 20 



else 



= d A 1 



r=sqrtCl/2)*j''+LUT_;[j',d']+128 • ss 
y-sqitCl/2)*tf'+Xt/T_^j',rf>128* dd 

"for the equality': 



35 



1 1 

1 1 



Alternatively, the look up table may return both the square 
root of s (or d) and the rounding ofl&et. In one embodiment, 
such look up tables have seven bits of data width. 

jc.LUT_sqrtl/2_7[j;i;>128* ss 
y^UThd — sqrt/2_g[5',£/>12S* dd 

The values of s and d vary from 0 to 180 but not all pairs 
occur. In some cases where f and g are dimensional (10) 
arrays, indexing a ID look up table with s+181*d would 
waste almost half the memory locations. Because 181 is odd, 
using s72+181*dy2 does not properly handle boundary 
conditions. The following indexing scheme may be used, 

index =j72 +rf'*90 +(d+l)/2 



FIG, 12 is a block diagram of one embodiment a rotation 60 
according to the present invention. Referring to FIG. 12, 
inputs a and b are added by adder 1201 to produce sum (s), 
while inputs a and b are input into subtracter 1201 which 
determines the difference (d) of a-b. The sum (s) is input to 
divider 1203 which divides it by 181. The remainder output 65 
of divider 1203 is coupled to the inputs of parity correction 
block 1205 and the quotient output is coupled to multiplier 



20 



The square root of 1/2 multiplied by s and d can be 
determined using 128/181 (or 181/256) or a look up table. 25 
The rounding offset may be found in the look up table. In 
one embodiment, the rounding offsets are -1 ... 1 so the 
data width of the look up tables can be two bits. The square 
root of the portion of the inputs represented by ss and dd is 
128ss and 128dd respectively, which may be implemented as 30 
a shift. 



1206. The difference output from subtracter 1202 is input to 
divider 1204 which divides it by 181 and outputs the 
remainder result to the other input of parity correction block 
1205 and outputs the quotient to the input to multiplier 1207. 
Parity correction block 1205 performs the parity correction 
described above and outputs 's and d'which are coupled to 
two inputs of look-up table (LUT) 1208 and to multipliers 
1209 and 1210 respectively. Multiplier 1206 multiplies the 
ss output from divider 1203 by 128 and outputs the result to 
adder 1211. Multiplier 1207 multiplies the output of the dd 
output of divider 1204 by 128 and outputs the result to adder 
1212. Multiplier 1209 multiplies s'by ^'T72and outputs the 
result to adder 1211, while multiplier 1210 multiplies d'by 
vT^and outputs the result to adder 1212. 

LUT 1208 generates the f and g values as described above 
and outputs them to adders 1211 and 1212, respectively. 
Adder 1211 adds the inputs together to produce the x output 
of the rotation, while adder 1212 adds its inputs to generate 
the y output. 
Inverse Computation 

In one embodiment, in order to compute the complete 
inverse, the approximation of 1/^^128/181 is assumed. 

First, the inputs x and y are divided by 128, while saving 
the remainder, according to the following: 



j~y mod 128 

Next the modulo 128 values are multiplied by the square 
root of two and the rounding offsets are subtracted, (This can 
be combined into one LUT.) Because i and j may be any 
value from 0 to 127,- all (or most if there are imused extras) 
look up table entries may be used without needing a fancy 
indexing scheme. 



d^sqn f2)7-LUTi3 g^^ inverselO'] 

Afterwards, compensation is made for the case when the 
ss parity is not the same as, the dd parity for odd denomi- 
nators by using, in one embodiment, the following pseudo 
code: 



if {ss is odd and dd is even) or [ss is even and dd i% odd) 
if-(rf=180) 



50 



else 



d' =dM 



The sum and difference are computed according to the 
55 following equations: 



sum^'+lSl*j^ 



differcnce^'+181 *dd 



Lastly, the sum and difference are changed back into 
original values according to the following equations: 

a-8um/2+{diffcrcncc+l)/2 
itesum/2 — diflference/2 

The inverse may be implemented in a maimer similar to 
that shown in FIG. 12 except the direction is reversed. Such 



05/17/2004, EAST Version: 1.4.1 



6,058,215 



21 

an implementation would be apparent to one skilled in the 
art in view of FIG. 12. 

Creating Look Up Tables for Rounding Offsets 

An extra is assigned to every hole. For very small look up 
tables, exhaustive search can be used to find the best 
mapping. For larger look up tables, there are several tech- 
niques that can be used to successively refine look up tables. 

The first technique is a deterministic assignment of extras 
to collisions. In one embodiment, the number of extras is no 
less than the number of collisions. The extras are spaced out 
with respect to the collisions to which they are assigned. 
This technique is fast and allows the process to be replicated 
at both ends. This technique also avoids having to transmit 
a look up table as one could be generated on the fly. 



for each collision_row 

determine the number of extra_rows needed to provide an 

extra for every collision in the current collision row. 
if a partial extra row is required, select the proper number of 

extras evenly spaced within the row 
sort all the extras to be used in column order 
assign extras to collisions in column order 
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Another optimization procedure is described in B. 
Kcminghan, Lin, An Efficient Heuristic Procedure for Par- 
titioning Graphics, Bell Syst, Tech. J., pp. 291-307, 1970. 
Optimization may be performed considering only collisions 

^ and extras or considering all input and output values. Opti- 
mization may be performed by swapping pairs of inputs and 
outputs or by rotating triplets of inputs and outputs. In one 
embodiment, swaps or rotations that reduce squared error 

jg are performed until no more improvements are possible. 

The following pseudo code is one embodiment of a high 
level optimization method. This illustrates a refinement 
procedure. This method allows for swapping pairs (previous 

^5 assignments). It allows for swapping an uaassigned extra 
with the extra in an assigned collision-to-extra pair. It allows 
for swapping where the error does not change, i.e., where the 
direction of the error is the only issue. A swap that does not 
change the error is made when it allows a future swap to 

20 improve error, that is part of a triple-swap or rotation of three 
pairs. 



The spacing that occurs is based on the number of 
collisions to extras. By dividing the number of collisions by 25 
the number of extras, an index factor is generated. Rounding 
to the next integer for each provides a set of integers 
indicative of which collisions to use. 

For illustration, consider the pattern of collisions and 
extras shown in FIG. 9. There are 144 collisions in 12 rows 
of 12 each. There are 144 extras in 8 rows of 9 each and 9 
rows of 8 each. The assignments for the first three rows of 
colhsions are as follows, the remaining rows are assigned in 
a similar fashion. 



for each collision 

compute squared error of current assignment 

do 

for each extra 

initialize extra pair swap to "not swapped yet" (eg. -1) 
initialize extra triple swap "not swapped yet" (eg. -1) 
initialize swap gain to zero 
for k = first swap candidate extra (eg. 0) to last swap candidate 
extra (eg. number of extras -1) 
try an find a better assignment for extra[lEl 
if any better assignments are found 
perform swaps 
while any better assignments are found 



12 collisions in first row 

use first extra row which has 8 extras in columns 3, 7, 13, 17, 21, 27, 
31, 37 

need 4 extras from second extra row, out of a total of 9, 

use the ones in columns 4, 14, 24, 34 
The assignments are (extra row, extra 

column -> collision column) 

1,3-^2; 2,4-»'6; l,7-*8; 1,13—12; 2,14-*16; 1,17—18; 

1,21-22; 2,24—26; 1,27—30; 1,31—32; 2,34—36; 1,37—40 
12 collisions in second row 
use remaining 5 extras from second extra row, 

out of tota] of 8j use the ones in 

columns 0, 10, 20, 28, 38 
use 7 extras from the third row, out of a 

total of 8, use the ones in columns 

3, 1, 13, 17, 21, 27, 31 
The assignments are (extra row, extra 

column o collision column) 

2,0—2; 3,3—6; 3,7—8; 2,10—12; 3,13—16; 

3,17—18; 2,20-22; 3,21-26; 3,27-30; 

2,28—32; 3,31—36; 2,38-40 
12 collisions in ttiird row 
use remaining 1 extra from third extra row^ 

out of a total of 8, use the one in coiunm 37 
use 9 extras from the fourth extra row 

in columns 0, 4, 10, 14, 20, 24, 28, 34, 38 
use 2 extras from fifth extra row, out of total of 8, use 3, 21 
The assignments are (extra row, extra 

column -> collision column) 

4,0—2; 5,3—6; 4,4—8; 4,10—12; 4,14—16; 

4,20— IS; 5,21—22; 4,24—26; 4,28—30; 

4,34—32; 3,36; 4.40 



In order to calculate the squared error, with a rotation by 
an angle of q, the following procedure may be used. 
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let XC be first coordinate of collision 
let YC be second coordinate of collision 
let XE be first coordinate of extra 
let YE be second coordinate of extra 
let SIN = sin q 
let COS - cos q 

squared error = (COS*XC-Tound(COS*XE))2 + (COS'-YC- 
round(COS*YE))2 



The pseudo code for one embodiment of the "try and find 
a better assignment for extra[k]" routine is as follows: 



55 



60 



Given a starting mapping, mappings can be improved by 
gradient decent (swap extra/collision assignments if mis- 
match is reduced) or similar optimization procedures (e.g., 
simulated annealing, etc.). 



initialize squared error bcst__impiovement for 
best swap to zero search for best pair swap between 
extia[k] and extras[n, n > k] 

if best pair swap has same squared error as without swap 
search for best triple swap between 
extra[k], extra[n] and 

cxtras[m,m > n or n > m > k] 
if extra[n] or extra[k] or 
extra[m] have already been marked 
for swapping ignore best triple swap 

if the best swap reduces squared error 

if eKtra[n] has already been marked for a pair swap 
mark the extra to be swapped with extra[n] as 
"not swapped yet' 
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-continued 



if cxlni[n] has already been marked for a triple swap 

mark the extras to be swapped with extra[n] as "not ^ 
swapped yet' 

if extra[lc] has already been marked for a triple swap 
mark the extras to be swapped with cxtra[k] as 'not 
swapped yet" 

if a triple swap is the best swap and cxtra[m] 
has already been marked for a triple swap 
mark the extras to be swapped with extra[n] as 'not 
swapped yet" 

if a pair swap is the best swap 

mark extia[n] to be swapped with extra[k] 
mark extra[k] to be swapped with extra[n] 

else 15 
mark cxtra[n] to be triple swapped 
mark extra[k] to be triple swapped 
mark extra[m] to be triple swapped with extra[a] and 
cxtra[k] 

20 

The pseudo code for one embodiment of the "search for 
best pair swap" routine is as follows: 

25 



for each extra[n] 

calculate swap error = squared error for 
extra[n], collision[k] + 

squared error for extra [k], collision[n] 30 
calculate current error - the sum of 
squared error for the current assignments 

of extra[n], extra[k] 
this_improvement - current error - swap error 
if (thi5_improvement >» 0) and 

(this_improvcment >= ^5 
bes t__imp ro veme nt) 
best__improvement = this improvement 
best swap found so far is n 



40 

The pseudo code for one embodiment of the "search for 
best triple swap" routine is as follows: 



-continued 

swap extra[n] and extra[k] 

calculate squared error of current assignment 



An Almost Balanced DCT Implementation 

Look up table based rounding allows implementation of 
arbitrary 2-poinl rotations with low mismatch. The present 
invention allows for creating transforms that are efficient 
and almost balanced and that can then be implemented 
reversibly with the look up tabic technique of the present 
invention (or some other technique). 

A balanced efficient transform exists when the determi- 
nant of the transform is a perfect square. An almost balanced 
transform is created by multiplying all the values in the 
transform matrix by a constant. The constant is selected such 
that the determinant can be factored into two close equal 
factors. Table 3 shows some examples of almost balanced 
transforms. 

TABLE 3 



Almost Balanced Transforms 
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mul- 
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1.04 
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1.17 
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45 

for each extra[n] 

calculate swap error - squared error for 
cxtra[ml collision [k] 

+ squared error for extra[n], 

colUsion[m] -h squared 

error for extra[ki collision[n] 50 
calculate current error = current error - swap error 
if (this_improvement >=■ 0) and 
(this__improvemcnl >= 

besl__impro vem ent) 

best_tmprovcmcnt = this improvement 

best swap found so far is n 55 



The LCM column in Table 3 contains a least common 
multiple of the denominators, after any common factors in 
the numerators are removed. A look up table of size LCM^ 
is sufficient to implement the transform. Large values are 
broken into a quotient and a remainder after division by the 
LCD. All of the look up tables required for the transforms in 
Table 3 are too large to be easily understood examples. The 
2,1 almost balanced transform with multiplier 2 is described 
as an example in the equation below. 




l^ferl = 20 = 5x4 



The pseudo code for one embodiment of the "perform 
swaps** routine is as follows: 



for each extra[k] 

if extra[k] marked 

a = extra to swap with cxlra[kl 
if n < k 



60 The two divisors are 5 and 4, which has balance ratio 1 .25 . 
This example requires only a table size of 10^=100 and the 
table has a simple structure. 

Table 4 is the look up table for the transform. All except 
65 the highlighted squares arc determined using the following 
equation. The highlighted squares indicate collisions 
assigned to extras. 
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X = 

5 

2a - 4b + 2 
^-[—^ 

Look Up Tkble for 2,1 Almost Balanced Tranafoim 
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FIG. 10 is a plot of x,y pairs that occur from the equation 
- above when a and'b are in the range 0 . . .~10. The circles 
and arrows show the mapping of collisions to extras. Appen- 
dix A contains one embodiment of source code which 
implements this transform. It includes both the look up table 
and the quotient/remainder processing needed to handle 
arbitrarily large values 
8x8 Transforms 

A variety of the building blocks described above may be 
used in the various 8x8 reversible APTs, some of which are 
shown in Table 5. The Chen decomposition of the subsidiary 
matrix shown in FIG. 3 is used except for the APT labeled 
Hein which uses the subsidiary matrix shown in FIG. 4. The 
"efiBcient" and "efficient" Hein use the building blocks of the 
reversible implementations described above that do not have 
internal rounds or look up tables, except for the and "R** 
in the subsidiary matrix that is done with a look up table. 
Another APT uses ladder filter building blocks. The "almost 
efficient" APT is closer to balanced which leads to good 
lossy performance. The "almost efficient" APT has determi- 
nant 1.04 (log^l. 04=0.06 bits of redundancy). 

TABLES 



Building blocks used to create 8x8 reversible AF'R 



efficient 

APT eflScient Ladder "almost 

parameter efficient Hein filter efficient* 

A 5,1 transform- 12,5 trans fonn Ladder 5,1 transform 

unbalanced balanced, 
ineflicient 

B 12,5 transform 12,5 transform Ladder 12,5 transform 



TABLE 5-continued 



Building blocks used to create 8x8 reversible APT^ 
efficient 



45 



AFT 




efficient 


Ladder 


"almost 


parameter 


efficient 


Hein 


filter 


efiEcient' 


C 


3,2 transform 


3,2 transform 


Ladder 


3,2 transform 
growth in 
sum 


1 (outside 


S-transfonn 


S-tiansform 


Ladder 


LUT 


subsidiary 










matrix) 










1 and R 


LLTT 


LUT 


Ladder 


LUT 


(inside 










subsidiary 










matrix, 










before R) 










1 (inside 


S-transform 


S-transform 


Ladder 


LUT 


subsidiary 










matrix. 










after R) 











55 

In one embodiment, a finite state machine (FSM) entropy 
coder and a trivial context model losslessly codes and 
60 decodes with the various transforms, such as shown in FIG. 
IC. An example of an FSM coder is shown in U.S. Pat. Nos. 
5,272,478, 5,363,099 and 5,475,388, each of which is incor- 
porated by reference. 

65 

Table 6 shows the growth in the size of coefficients 
(number of bits) for the ID 8-point efficient reversible APT. 
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TABLE 6 





Growth in size of cocflScicnts 




for 8-poiDt efficient reversible APT. 


[NPUT 


0 1 2 3 4 5 6 7 


GROWTH 


+0+0+2+2 +1 +6 +2 +5 



As an example for this transform, if the inputs are 8 bits, 
the total of 64 bits of input would grow by 18 bits and would 
result in 82 bits of output. The growth for the 2D 8x8 
transform can be determined by applying the ID results 
horizontally and vertically. Table 7 shows the growth in the 
size of coeflBcients for the ID 8-point "almost eflEicient" 
reversible APT. 



TABLE 7 





Growth in size of coefiSdents 










for S- point "almost efficient" reversible 


APT. 






[NFUT 


0 1 


2 


3 


4 


5 


6 


7 


GROWTH 


+2 +2 


+2 


+8 


+2 


+1 


+2 


+2 



As an example for this transform, if the inputs are 8 bits, 
the total of 64 bits of input would grow by 21 bits and would 
result in 85 bits of output (when adding the growth from all 
bits together). Also for example, in the 2-D case where ID 
results are applied horizontally and vertically for the hori- 
zontal coeflScient 2 and vertical coefficient 3, there is an 
additional 10 bits (as both are added 2+8-10). The good 
compression results arc due to having no redundant least 
significant bits; the growth is mostly easy to compress more 
- significant bits. - - - - - _ _ 

To be reversible, an APT must output different coefficients 
than a floating point DCT, so some mismatch is unavoidable. 
However, a reversible APT is lossless without quantization. 
The lossless feature allows for no systemic error is the 
inverse transform is the inverse reversible APT If required 
by an application, reversible APT coefficients could be 
inverse transformed to the original pixels and then forward 
transformed with a floating point DCT if exact DCT coef- 
ficients were needed. This would again lead to no mismatch. 

Tables 8-10 show minimum quantization matrices for 
various 8x8 APTs. Minimum quantization matrices set forth 
an amount, or more, of quantization that if applied would 
result in the reversible AFT coefficient differing from tme 
DCT coefficients by no more than ±1. The smaller the 
minimum quantization values, the less mismatch error in the 
transform. TTie DC quantizer is shown in the upper left 
comer and the AC quantizers are in standard DCT (not 
zig-zag) order. The "8x8" efficient reversible APT and the 
ladder filter based APT both have relatively large minimum 
quantizers for DC and coefficients near DC. These trans- 
forms would therefore only approximate the DCT well at 
low compression/high quality. The "almost efficient" APT 
has smaller minimum values in general and has much 
smaller values for DC and near DC. This APT approximates 
the DCT well at typical JPEG compression ratios. 

The "almost efficient" APT has the most mismatch 
(highest minimum quantizers) in coefficients generated with 
the "C" APT parameter. A look up table based "C" 2-point 
rotation might further reduce mismatch. 



TABLE 8 



Minimum quantization matrix for 8 x 6 efficient reversible AFT 
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5 
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4 


4 


1 


2 


2 


7 
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4 


2 


3 


2 
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5 


4 


2 


3 
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3 


3 
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2 


2 


6 
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4 


4 


2 


2 


2 
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11 
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4 


4 


2 


3 
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TABLE 9 



Minimum quantization matrix for 
8x8 efficient reversible AFT using ladder filter 
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13 
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7 
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7 
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6 


6 


6 
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The structure of the minimum quantization matrices for 
some transforms are explained by the structure of the APT 
scale factor matrices. The ladder filter implementation is an 
exception, all values in it's scale factor matrix are 1. Tables 
11 and 12 show the scale factors for the 8x8 efficient 
reversible APT and the "almost efficient" version. Large 
scale factors (greater than 1) result in large minimum 
quantization values. 

35 - TABLE-10 - 



Minimum quantization matrix for 
8x8 "almost efficient" reverisibte AFT 
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TABLE 11 

50 Scale factors for 8 x 8 efficient reversible APT 
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8. DO 


14.42 


2.83 


5.10 


4.00 


0.39 


2.S3 


055 


14.42 


26.00 


5.10 


9.19 


7.21 


0.71 


5.10 


1.00 


2.83 


5.10 


1.00 


1.80 


1.41 


0.14 


1.00 


0.20 


5.10 


9.19 


1.80 


3-25 


2.55 


0.25 


1.80 


0.35 


4.00 


7.21 


1.41 


2.55 


2.00 


0.20 


1.41 


0.2S 


0.39 


0.71 


0.14 


0.25 


0.20 


0.02 


0.14 


0.03 


2.83 


5.10 


1.00 


1.80 


1.41 


0.14 


1.00 


0.20 


0.55 


1.00 


0.20 


0.35 


0.28 


0.03 


0.20 


0.04 



TABLE 12 



Scale factors for 8 x 8 "almost efficient" reversible APT 



1.00 


0.98 


1.00 


0.28 


1.00 


3.61 


1.00 


0.98 


0.98 


0.96 


0.98 


0.27 


0.98 


3.54 


0.98 


0.96 


1.00 


0.98 


1.00 


0.28 


1.00 


3.61 


1.00 


0.98 


0.28 


0.27 


0.28 


0.08 


0.28 


1.00 


0.28 


0.27 
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TABLE 12-coiilinucd 



Scale factors for 8 x 8 'almost efficient" revcisible API' 



100 


0.98 


1.00 


0.28 


1.00 


3.61 


1.00 


0.98 


3.61 


3.54 


3.61 


1.00 


3.61 


13.00 


3.61 


3.54 


1.00 


0.98 


1.00 


0.28 


1.00 


3.61 


1.00 


0.98 


0.98 


0.96 


0.98 


0.27 


0.98 


3.54 


0.98 


0.96 



Lx)ssy Coding 

Lossy coding with the reversible APT starts with lossless 
encoding using the reversible APT, The decoding is lossy 
and may use a legacy DCT based decompressor such as a 
JPEG decoder. 

Reversible APT coeflBcients may be used in a lossy 15 
compression system such as JPEG in the same manner as 
regular APT coefiBcients. A JPEG quantization matrix is 
chosen. Each quantizer is divided by the corresponding APT 
scale factor, resulting a new combined quantizer and scale 
factor. The APT and the combined quantization and scale 20 
factor matrix are used as a replacement for the DCT and 
quantization in JPEG. Any quantization matrix can be used; 
however, mismatch will occur if the scale factor is larger 
than the quantizer. 

In an alternative embodiment, the quantization division/ 
multiplication is replaced with shifting to select desired bits. 
This reduces computational cost. It allows an embedded or 
multi-use system were more bits can be selected for higher 
quality up to lossless when all bits are selected. The quan- 
tizers are chosen such that when they are divided by the 
corresponding scale factor, they are a power of 2 (or 
approximately a power of two). 

JPEG has a progressive mode called successive approxi- 
mation. (Although this mode is less well known and less 
frequently used than the baseline sequential mode of JPEG.) 
An alignment scheme can be chosen that results in a 
particular JPEG quantization. This can be used to generate 
coded data for the first stage of successive approximation 
which can be very similar to baseline sequential data if 
spectral selection is not used. Successive approximation 
allows the remaining data to be coded by bitplanes in an 
embedded fashion. The progressive JPEG also has spectral 
selection. Spectral selection allows bitplanes of only speci- 
fied coeflScicnts to be coded. Spectral selection can be used 
to specify which coefiScients have bits in a bitplane versus 
coefficients which have already been fully described. If large 
quantization values were chosen for the first stage, all (or 
almost) all of the coefficients would be bitplane coded. 

If using JPEG progressive mode was not desired, 50 
transcoding can be used to create sequential lossless JPEG 
codestreams of different fidelities. APT coefficients can be 
coded lossless with some method, not necessarily JPEG 
compatible. To create a stream, lossless decoding is 
performed, a desired quantization is performed either by 55 
division or shifting. The quantized coefficients can then be 
coded in a JPEG compatible way. There is a computational 
savings over lossless coding methods that do not use the 
reversible AFT since no DCT is required during transcoding. 

Table 13 shows an example of bits to shift right for each 60 
APT coefficient using the "almost efficient*' 8x8 reversible 
APT. This corresponds to a quantizer/scale factor of T 
where n is the number of bits to right. Table 14 is the 
equivalent JPEG DCT quantization matrix that the shifts in 
Table 13 implement Table 14 is similar to the psychophysi- 65 
cally weighted luminance quantization tables typically used 
with JPEG. 
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Tables 15 and 16 show the bits to shift and corresponding 
quantizers for close to imiform quantization using the 
"almost efficient" 8x8 reversible APT. 

TABLE 13 



Bits to shift right for "psycho visual" 
shift based quantization 
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6 


4 


8 


4 


4 


7 


7 


7 


S 


7 


4 


6 


7 


7 


7 


7 


9 


7 


5 


7 


7 



TABLE 14 



Quantization matrix for "psychovisual" shifts 
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16 


16 
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16 
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16 


16 


16 
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16 


17 


16 


14 


31 


30 


32 


31 


16 


18 


16 
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31 
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72 


41 


72 
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72 


69 


64 


63 


64 


72 


128 


58 


64 


63 


115 


57 


58 


64 


58 


104 


58 


57 


128 


125 


128 


72 


128 


58 


64 


125 


125 


123 


125 


138 


125 


113 


125 


123 



Uniform quantization gives the best rate/distortion accord 
ing to the mean squared error (MSE) metric. 



TABLE 15 



Bits to shift right for "normalized" 
shift based quantization 
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TABLE 16 



Quantization matrix of "normalized" shifts 
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14 
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14 
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14 


16 


16 


14 
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14 


16 


14 
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14 


14 


16 


16 


16 


18 


16 


14 


16 


16 


16 


15 


16 


17 


16 


14 


16 


15 



Implementations Issues 

The reversible APT has a higher computation cost than a 
regular APT because scaling and rounding is performed at 
each step. To partially compensate for this disadvantage, the 
register width for the reversible APT is reduced at every 
step. Small register width and the simple parameters used in 
calculations aid implementation. In software, multiplication 
and division operations can be replaced by look up tables. In 
hardware, dedicated, low-hardware-cost multiply-by-N and 
divide-by-N circuits can be used. 

For example, consider the implementation of part of the 
"B" 2-point rotation with two lookup tables described below 
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and shown in FIG. 11. In hardware, the two look-up tables 
could be replaced with dedicated logic. 

l2a + 5b + 6\ 



I lZa + 56 + 6 I 



and rl=(12a)mod 13 LUT2: given b, returns 



d2 = 



13 



and r2 =(5b+6)rnod 13 

This produces the following results: 



x-rfl+d2 when rl+r2<13 



x-rfl-kG+a when rl+/^>13 



Apptfldix A. Almost balanced 2,1 transform source code 
The following "awk" source code implements the 
almost balanced 2,1 transform usmg a look up 

table as described above. 

#!/usr/bin/nawk -f $0 S* 

# Copyright 1996, 1997 RICOH 

# integer division followed by "floor" rounding 
function mydiv (n,d) { 

m - int(n/d); 

if C(m*d !- a) && (n < 0)) { 

return m-1; 
} else 

return m; 

} 

BEGIN { 

for (a - 0;a < 10;a++) { 

for (b = 0;b < 10;b++) { 

X = mydiv(4'a = 2*b » 2,5); 
y = mydiv(2*a - b + 2,4); 
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-continued 



Referring to FIG. 11, the LUTs 1201 and 1202 operate as 
follows: LUTl : given a, returns 



10 



Reversible transforms for unified lossy and lossless com- 
pression are extended to include the discrete cosine trans- 
form (DCT), the most popular transform for image coded. 
The reversible Allen Parameterized Transform (APT) imple- 30 
meats the DCT as a cascade of "integer rotations," each of 
which is implemented reversibly. The entropy of reversible 
APT ooefiBcients was found to similar to the entropy of 
reversible wavelet coefiScients. An "almost balanced" 
reversible APT suflSciently low mismatch to the floating 35 
point DCT so "a legacy JPEG decoder "can l}e used for lossy ' 
decompression . 

Whereas many alterations and modifications of the 
present invention will no doubt become apparent to a person 
of ordinary skiU in the art after having read the foregoing 40 
description, it is to be understood that the particular embodi- 
ment shown and described by way of illustration is in no 
way intended to be considered limiting. Therefore, refer- 
ences to details of the preferred embodiment are not 
intended to limit the scope of the claims which in themselves 45 
recite only those features regarded as essential to the inven- 
tion. 

Thus, a reversible DCT-based system has been described. 



} 



# ux Up boundaries so it tiles correctly 
ifC{x«=i8) && (y =-4)) 

ifC(x^-2) && (y"-6)) 

# match internal collisions and holes 
ifChit[x y]« 1){ 

xl = x; 

tf{hit[x y]-l){ 
print "ERROR"; 
exit 1; 

} 

hit[x yl- 1; 

al - lut_a[xl yj 

bl = lut_b[xl y]; 

el = xl - ((4.0 -al + 2.0*bl) / 5.0>, 
eO = X - ((4.0*a + 2.a*b) / 5.0); 
els = X - C(4.0*al + 2.0*bl) / 5.0); 
eOs = xl - ((^O'a + 2.0*b) / 5.0); 
if (els'els + e0s*eOs < el*el + eO*eO) { 
# swap assignments 

lut_a[x y] = al; 

lut_b[x y} - bl; 

lut_x[al bl] = x; 

lut_>(al bl] « y; 

lut_^xl y] » a; 

lut_b(xl y] = b; 

lut_Xa b] = xl; 

lut_y[a b] - y; 

} 

} else { 

hit[x y] - 1; 

lul_a[x y] = a; 
Iut_b[x y] - b; 
Iut_x(a b] - x; 
lut y[fl b] = y; 

} 

} 
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# print"tiie mapping - - _ _ _ _ 

for (a = 0; a < 10;a ++) { 
for (b = 0;b< 10;b++) { 
X - Iut__xta b]; 
y = Iut_y[a bj; 

el - X - (4.0*a + 2.0 *b) / 5.0; 
e2 = y - (2.0 *a - 4.0 *b) / 4.0; 
enor = el*el + e2*e2; 
print X " y "\ t->"a%"b"\ t" error; 

} 

print ""; 

^ . . 
# Check mapping for integers outside of 0 ... 9 

Jim = 25; 

for (a - -lim;a < Jim + l;a++) { 
ad = mydiv(a,10); 
am = a - ad'lO; 

for (b - -lim;b < lim + l;b ++ 0) { 

bd + mydiv(b,10); 

bm = b - bd'lO; 

xm + lut_y[am bm]; 

ym = lut_y[am bm]; 

X - 8*ad + 4*bd + xm; 

y - S'ad - 10*bd + ym; 

print X % "y"\ t- < "a',"b; 

if (hit2[x,y] I-"") { 

print 'ERROR: "x","y";"a". "b" & hit2[s,y]; 

hit2[x,y] - I ",'y";''a","b; 

print 



We claim: 

1. A compressor having a reversible Discrete Cosine 
Transform (DOT), wherein the DCT comprises a plurality of 
2 point rotations. 
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2. The compressor defined in claim 1 wherein a plurality 
of 2 point rotations comprise transforms. 

3. The compressor defined in claim 2 wherein each of the 
transforms has balanced scale factors. 

4. The compressor defined in claim 3 wherein the scale 
factors of both outputs of individual transforms are equal. 

5. The compressor defined in claim 3 wherein the product 
of the scale factors for two outputs of a transform is 1. 

6. The compressor defined in claim 3 wherein both scale 
factors of outputs of a transform are less than 1, 

T.The compressor defined in claim 1 wherein each of the 
plurality of 2-point rotations is reversible. 

8. The compressor defined in claim 7 wherein the plurality 
of 2-point rotation has no internal rounding. 

9. The compressor defined in claim 7 wherein at least one 
of the 2-point rotations comprises an S-transform. 

10. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises an unbalanced 5,1 
transform. 

11. The compressor defined in claim 1 wherein at least one 
of the 2-point rotations comprises a balanced 5,1 transform. 

12. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises a 60,11 transform. 

13. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises a 12,5 transform. 

14. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises a 3,2 transform. 

15. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises a 4,3 transform. 

16. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations comprises a ladder filter. 

17. The compressor defined in claim 16 wherein results of 
each multiplication performed in the ladder filter are 
rounded to an integer value. 

18. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations Has' rounding ofifeets that are "a 
function of its inputs. 

19. The compressor defined in claim 1 wherein at least 
one of the 2-point rotations has rounding ofiEsets that are 
functions of the sum and difference of its inputs modulo a 
divisor. 

20. A compressor having a reversible Discrete Cosine 
Transform (DCT) means for transforming information into 
coeflScients, wherein the DCT means comprises a plurality 
of 2 point rotations. 
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21. The compressor defined in claim 20 wherein a plu- 
rality of 2 point rotations comprise transforms. 

22. The compressor defined in claim 21 wherein each of 
the transforms has balanced scale factors. 

5 23. The compressor defined in claim 22 wherein the scale 
factors of both outputs of individual transforms are equal. 

24. The compressor defined in claim 22 wherein the 
product of the scale factors for two outputs of a transform is 
1. 

10 25. The compressor defined in claim 22 wherein both 
scale factors of outputs of a transform are less than 1. 

26. The compressor defined in claim 20 wherein each of 
the plurality of 2-point rotations is reversible. 

27. The compressor defined in claim 26 wherein the 
15 plurality of 2-poinl rotations has no internal rounding. 

28. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises an S-transform. 

29. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises an unbalanced 5,1 

20 transform. 

30. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises a balanced 5,1 trans- 
form. 

31. The compressor defined in claim 20 wherein at least 
25 one of the 2-point rotations comprises a 60,11 transform. 

32. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises a 12,5 transform. 

33. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises a 3,2 transform. 

30 34. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises a 4,3 transform. 

35. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations comprises a ladder filter. 

36. The compressor defined in claim 35 wherein results of 
35 each multiplication performed in the ladder filter are 
~ rounded to an'integer value. - - - - - 

37. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations has rounding oflfeets that are a 
function of its inputs. 

40 38. The compressor defined in claim 20 wherein at least 
one of the 2-point rotations has rounding ofiEsets that are 
functions of the sum and difference of its inputs modulo a 
divisor. 

* « * « Ht 
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